This report is an analysis of the loan data from prosper. The prosper loan data (pld) consists of data for more than 110,000 loans with 81 variables decribing each loan.
## 'data.frame': 113937 obs. of 82 variables:
## $ ListingKey : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : Date, format: "2007-08-26" "2014-02-27" ...
## $ CreditGrade : Ord.factor w/ 7 levels "HR"<"E"<"D"<"C"<..: 4 NA 1 NA NA NA NA NA NA NA ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ ClosedDate : Date, format: "2009-08-14" NA ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Ord.factor w/ 7 levels "HR"<"E"<"D"<"C"<..: NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
## $ Occupation : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
## $ CurrentlyInGroup : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
## $ GroupKey : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
## $ DateCreditPulled : Date, format: "2007-08-26" "2014-02-27" ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : Date, format: "2001-10-11" "1996-03-18" ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Ord.factor w/ 8 levels "Not employed"<..: 4 5 8 4 7 7 4 4 4 4 ...
## $ IncomeVerifiable : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Date, format: "2007-09-12" "2014-03-03" ...
## $ LoanOriginationQuarter : Ord.factor w/ 35 levels "Q4 2005"<"Q1 2006"<..: 8 34 6 29 32 33 31 31 33 33 ...
## $ MemberKey : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
## $ ListingCategory : Factor w/ 21 levels "Not Available",..: 1 3 1 17 3 2 2 3 8 8 ...
The plot above show the histogram of loan start date. using a bin width of 30 would approximately give us the number of loans started each month. there is a gap around 2009 and number of loans start to increase again. I am not sure if data is representative of overall financial market. Possibly the increased number of loans might be due increased bussiness by Prosper rather than overall increase in loan requests.
The information for loans originated prior to July 2009 is different than loans originated after July 2009. To keep the analysis consistent, I have decided to only consider loans origianted after July 2009.
##
## Cancelled Chargedoff Completed
## 0 5342 19786
## Current Defaulted FinalPaymentInProgress
## 56576 1008 205
## Past Due (>120 days) Past Due (1-15 days) Past Due (16-30 days)
## 16 806 265
## Past Due (31-60 days) Past Due (61-90 days) Past Due (91-120 days)
## 363 313 304
The table above shows the number of loans in various stages
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 7500 9076 13500 35000
The loan amounts distribution is positively skewed. However, if we use a log scale for the x axis, the histogram show a normal distribution. Note that most loans are exact values and the distribution shows peaks at common loan values of $4000, $10000, and $15000. the smallest loan is $1,000 and largest loan is $35,000
Loan term
##
## 12 36 60
## 1614 58825 24545
The loans are mostly for 36 months with some loans for 60 months and a small number of loans are 12 months long.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.583 16.320 21.940 22.660 29.250 42.400
The figure above shows the distribution of Annual Percentage Rate (APR) for the loans. The original number was in rate rather than percentage, so I multiplied it by 100 to be easier to comprehend.
Unsurprisingly, monthly payments follow the same pattern as the original loan amount.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3433 5000 5930 7083 1750000
The monthly income of borrowers vary with a median of 5000 While the maximum recorded monthly income is $1,750,000, it is most probably a mistake (I cannot think someone with that much money would get a $10,000 loan), and most incomes are less than $10,000 a month. Stated monthly income is provided by the borrower, so its reliability is not clear.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.150 0.220 0.259 0.320 10.010 7307
The average debt to income ratio is around 0.25, meaning borrowers generally carry a quarter of their income as debt. While 75% of borrowers have debt to income ratio of lower than 0.32, some borrowers have debt to income ratio of close to 1. A small number of borrowers have reported debt to income ratio of 10.01 which in fact means the actual debt to income ratio is greater than 10.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 619.0 679.0 719.0 718.4 739.0 899.0
In the dataset only the credit score bucket is reported, with the lower and upper range in separate variables. Here the histogram is for the upper range of the individauls credit score. Overall, we see a normal distibution for the credit score with a slight positive skew.
The graphs above shows the distribution of the borrower’s credit grade. Overall, average credit grades have a higher proportion.
Loan purpose
##
## Not Available Debt Consolidation Home Improvement
## 20 53246 6812
## Business Personal Loan Student Use
## 5315 0 280
## Auto Other Baby&Adoption
## 2244 9242 199
## Boat Cosmetic Procedure Engagement Ring
## 85 91 217
## Green Loans Household Expenses Large Purchases
## 59 1996 876
## Medical/Dental Motorcycle RV
## 1522 304 52
## Taxes Vacation Wedding Loans
## 885 768 771
This variable is the category defined by the borrower as the reason for the loan. By far, most loans are for debt consolidation (probably credit card debt). Home improvement and business are two other most common reasons for borrowing money.
There are 113937 observations each having 81 variables describing a loan. To ensure data consistency I remove loan riginated before July 2009, resulting in around 85,000 samples. Some varaibles describe the loan (LoanOriginalAmount, LoanOriginationDate, Term, BorrowerAPR, MonthlyLoanPayment, etc.). There are varaibles describing the borrower condition provided by borrower such as ListingCategory, StatedMonthlyIncome, Occupation, etc. Majority of the variables describe the credit status and history of the borrower (CurrentCreditLines, TotalCreditLinespast7years, DelinquenciesLast7Years, RevolvingCreditBalance, etc.). These variables define the risk associated with the borrower and possbiliy define the credit score (CreditScoreRangeUpper and CreditScoreRangeLower) of the borrower.
There are 4 type of variables in the dataset: - date variables, these variable were convert to R Date structure using as.Date function - numbers including the dollar amounts, rates, and integers - factors for variables such as CreditGrade, Occupation, IncomeRange, as well as boolean variables. - Unique IDs that can be integers or sequence of alphanumerics
It is worth noting that almost all variables are missing some samples. However, the number of missing variables are insignificant comapred to total observations and will not affect the integrity of the conclusions.
The main interest is to identify what affects APR. It is clear that CreditGrade and APR are directly related, but it would interesting to see if they have a one-to-one relation or not. And how CreditGrade can be calculated from the history.
Besides the Creditgrade, I would like to investigatet if any of the following impact the APR: income, loan purpose, loan date, loan amount, occupation. I would also like to see what aspect of the credit history impacts the creditgrade the most. The variables of interest are: IsBorrowerHomeowner, FirstRecordedCreditLine, CurrentCreditLines, OpenRevolvingAccounts, OpenRevolvingMonthlyPayment, RevolvingCreditBalance, DebtToIncomeRatio, and BankcardUtilization.
I created a new variable ‘CreditHistoryLength’ which is the time difference between loan date and first credit line in years.
I added anothe variable, the ‘CreditLoanRatio’ which is the ratio between AvailableBankcardCredit and LoanOriginalAmount.
I also created a new variable which was basically the actual name of the loan category instead of the integer number in the dataset.
Some variables such as CreditGrade and IncomeRange that are expected to have ordered factors did not have proper ordering so I changed the order of factors.
Also the dates variables were imported as strings into a factor. They needed to be changed to R date varaible to be work effectively. I used the as.Date function to change them. The time of day part of the variables were ignored.
The loan amount distribution is positively skewed, and log-transforming the price proved to show the normal distribution in loan prices.
The plots above show the relation between select variables. the first row show the relation between APR and other variables. The impact of credit rating on APR is very clear. Also there considerable correlation between APR and credit score, loan amount, as well as bankcard utilization.
Looking at credit rating and loan amount, it seems people with lower credit rating only qualified for smaller loans. Also loan amounts seemed to have generally increased over years. This might be related to prosper bussiness growth rather than borrowers demand for larger loans.
the relation between APR and credit rating and other variables need to examined in more details.
## # A tibble: 8 × 4
## ProsperRating..Alpha. APR.mean APR.median n
## <ord> <dbl> <dbl> <int>
## 1 HR 0.35606120 0.35797 6935
## 2 E 0.33055055 0.33215 9795
## 3 D 0.28058055 0.28488 14274
## 4 C 0.22612440 0.22362 18345
## 5 B 0.18403003 0.18173 15581
## 6 A 0.13890939 0.13799 14551
## 7 AA 0.09004073 0.09000 5372
## 8 NA 0.18688130 0.17018 131
As suspected previously, the main varaible describing the APR is the ProsperRating. This variable seems to be one that Prosper is using to choose the APR for its costumers. Therefore, it is likely not avaiable before starting a loan.
The graphs above show the loan APR broken into various graphs based on the borrowers credit grade. In this graph, we can see more detail about the APR distribution based on the credit grade. The distirbutions validate the concludions made from the previous boxplots. It is also clear that generally APR has lower variance for people with better credit grade.
The graph above present the credit score distribtion for various credit grades. While borrowers with higher credit grades generally have higher credit score, it seems credit score alone does not define the credit grade of the borrower.
##
## Pearson's product-moment correlation
##
## data: pldn$CreditScoreRangeUpper and pldn$BorrowerAPR
## t = -180.4, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.531077 -0.521354
## sample estimates:
## cor
## -0.5262327
As expected, there is strong correlation between credit score and the APR. As the credit score decreases the APR increases.
Over the years, the maximum loan interest rate have slightly decresed, but overall interest rates does not seem to be affected significantly by time.
Higher loan amounts seem to have lower APR. However, it is unlikely that the APR is lower due to larger loan. I suspect people with lower credit score do not qualify for larger loans; therfore, we do not see large loans with high APR. And generally poeple with better credit score (which would get lower APR) can get the larger loans.
##
## Pearson's product-moment correlation
##
## data: pldn$CurrentCreditLines and pldn$BorrowerAPR
## t = -32.039, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1158856 -0.1025995
## sample estimates:
## cor
## -0.1092474
It seems there is a weak relation betweem number of credit lines and APR. and poeple with higher number of credit lines have lower APR and people with higher APR have few credit lines. Poeple who have access to credit line would only consider then loan if APR is lower than their credit line. However, poeple who do not have creditline have no choice but to get the loan at high APR.
##
## Pearson's product-moment correlation
##
## data: pldn$AvailableBankcardCredit and pldn$BorrowerAPR
## t = -116.75, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3775567 -0.3659686
## sample estimates:
## cor
## -0.3717771
Similar to having credit lines, someone who has access to credit through bank card (credit card I presume) would likely choose a loan only if the rate is favourable. For individuals without access to credit through bank card, they have no choice but to accept high APR loans.
##
## Pearson's product-moment correlation
##
## data: pldn$AvailableBankcardCredit and pldn$LoanOriginalAmount
## t = 74.539, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2414024 0.2540238
## sample estimates:
## cor
## 0.2477236
Poeple who have access to more credit thorugh bank card usually take larger loan, likely they use their bank card for smaller loans.
Looking at the histogram of the APR for the two cases (loans higher and lower than avaialble bank card credit), it is clear that average APR is higher when poeple are getting loans higher than their avialble bank card credit.
##
## Pearson's product-moment correlation
##
## data: pldn$BankcardUtilization and pldn$BorrowerAPR
## t = 74.523, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2413528 0.2539745
## sample estimates:
## cor
## 0.2476742
Besides having credit available through bank card, the amount that their bank card is utilized also matter for deciding to use the bank card or get a loan. poeple with higher bank card utilization have no choice but to accept higher APR.
##
## Pearson's product-moment correlation
##
## data: pldn$InquiriesLast6Months and pldn$BorrowerAPR
## t = 78.475, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2536616 0.2661996
## sample estimates:
## cor
## 0.2599415
There is some correlation between recent inquiries and APR. The likely reasoning is that more inquiries shows that borrower had been declined from other loans and has not other options.
##
## Pearson's product-moment correlation
##
## data: pldn$TradesNeverDelinquent..percentage. and pldn$BorrowerAPR
## t = -82.155, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2774715 -0.2650143
## sample estimates:
## cor
## -0.2712543
There is a considerable correlation between percentage of TradesNeverDelinquent and APR. People who have a good track record of paying of their debts are more likely to get better APR.
##
## Pearson's product-moment correlation
##
## data: pldn$CreditHistoryLength and pldn$BorrowerAPR
## t = -23.05, df = 84982, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08550195 -0.07213895
## sample estimates:
## cor
## -0.07882399
There is not much correlation between credit history length and APR. This is because most individuals have have long credit history. But, focusing on individuals with very short credit history, we see that short credit histry result in higher APR.
My aim is to describe the APR based on various variables describing the credit history. Although ProsperRating significantly describes the APR, I expect it not be available for a new individual before getting a loan. From the variables expected to be openly available, the most correlation is between APR and credit score. Higher credit score results in lower APR which is reasonable. However, the credit score alone does not represent all the variation in APR.
Other variables such as AvaialbleBankCardCredit and BankCardUtilization will define the borower’s access to other sources of credit and impacts their APR.
The loan amount and loan starting date does not seem to impact the APR. Although the APR for larger loans is lower on average, it is unlikely that asking for bigger loan would result in lower APR. I suspect the lower APR is the result of those individual having a better credit history.
Other variables such as delinquencies and credit score inquiries also have a negative impact on the APR.
On average loan amount seems to be consistent with the credit individuals have access through bank credit card. poeple who have more credit through bankcard get bigger loans from Prosepr. Additionaly, when the loan amount is lower than the credit avaialble through bankcard for someone, that individual tend to get better APR compared to when the loan amount is higher than credit available through bankcard.
The strongest relationship is between APR and ProsperRating, with APR decreasing with higher ProsperRating.
My aim to describe the APR from different variable avaialble trough credit history. We identified that credit score is most significant factor describing the APR. now we look if we can identify other variables that describe the APR variation within similar credit scores.
Most variables in the porcess are continuous, however, it is hard to see the variation in the color using conitnuous variables. I used the ‘cut’ function to break the continuous variables into facotrs and see the color changes easier.
The number of credit lines does not describe the variations in the APR within similar CreditScoreRange values.
We knew before that Poeple with lower bankcard utilization have higher credit score. This is again very dominant in this graph. However, within same CreditScoreRange, we can see that poeple with lower BankcardUtilization generally have lower APR.
Again, there is significant correlation between CreditLoanRatio and CreditScoreRange, i.e. people with higher CreditScoreRange have higher CreditLoanRatio. The impact of CreditLoanRatio on APR is not very clear for poeple with very high or very low credit scores. However, for people with average credit score (around 740 to 800), we can see that higher CreditLoanRatio generally yields to lower APR.
The impact of AvailableBankcardCredit is very similar to CreditLoanRatio. Given that the loans amounts follow a normal distribution, this is expected. However, it is not clear which one is the primary feature driving the APR.
Contrary to my expectations, the percentage of TradesNeverDelinquent does not seem to have much impact on the APR wihtin similar CreditScoreRange. One explenation can be that all the impacts of TradesNeverDelinquent is already accounted for in CreditScoreRange. Therefore, it cannot describe any more variation in APR.
Looking at the impact of InquiriesLast6Months on APR, we can see that poeple who have 0 or 1 Inquiry in last 6 month on average get better APR compared to poeple with more inquiries in the same CreditScoreRange.
From the graph above we can see that DebtToIncomeRatio partially impact the APR. Within same CreditScoreRange, poeple with lower DebtToIncomeRatio recieve lower APR for their loans.
When I looked at at TradesNeverDelinquent versus APR, there was considerable corelation between the two. However, this correlation seems to mostly camptured by credit score, i.e. within same credit score range, TradesNeverDelinquent cannot describe the APR.
On the Other hand, AvailableBankcardCredit significantly impact APR, even for people with same credit range. Same pattern also happended for credit to loan ratio.
Another two variable that showed significant impact on APR were InquiriesLast6Months and DebtToIncomeRatio. When lower, both would result in lower APR for the customers.
Here we have the distrbituion of loan amounts. If we use large bins with a logarithmic X axis scale, we see that the loans follow a normal distribution. However, when we use smaller bins, we see another phenomenon. Although overall loans are normally distributed, loans are generally round numbers, with loan amounts of 4000, 10000, and 15000 bein the most common.
This plot shows the the loan’s APR with respect to the rating that was given to borrower by prosper. There is a direct and strong realtionship between the rating and APR. Furthermore, as the ratings worsen the APR also have more variation.
The plot above depicts loans APR as a function of borrowers credit score. The points are colored based on the credit available to the borrower through bankcards. The first conclusion is that higher credit scores yield a lower loan APR for the borrower. Second, poeple with higher credit score have more access to credit through bankcards. Third, for people with similar credit score, those who have access to more credit through bankcards can obtain a loan with lower APR.
We looked at the loan data from Prosper. There are more than 110,000 loan data in this dataset. However, parts of data features were different before and after July 2009. To ensure a consistent analysis, only data for loans after July 2009 was used in this report. I needed to condition some of variables such as date variables and factor variables to work easier with them. I started by exploring by some interesting features such as loan amount, credit score, APR, prosper rating, etc. Eventually, I decided to look into how to identify the relationship between APR and other features, i.e. how Prosper decides what APR a borrower gets based on their history and current status.
The most correlation was between APR and the rating that Prosper assigns to borrowers. However, I decided not to use Prosper rating, as I suspected it is a variable that prosper calculates and likely indivdiduals do not have access to it. Besides prosper rating, the feature that described the APR the most was the credit score. On average, higher credit score coresponds to lower APR. However, credit score does not completely describe the APR variation. Other variables that I found to be impacting the APR were: DebtToIncomeRatio, InquiriesLast6Months, AvailableBankcardCredit, and BankcardUtilization.
Contrary to my expectation, I coudld not find a clear impact from the following variables on the APR: CurrentCreditLines, TradesNeverDelinquent, and LoanOriginationDate. I did not try to isolate the impact of other variables before assessing these varaibles. So it is possible that their impact is masked by variation due to other features.